AMENDMENTS TO THE CLAIMS 



1. (Previously presented) A computer- implemented method for selectively 
accessing a document during a current crawl of a server computer, the document being identified 
by a document address specification, the document having been retrieved during a previous 
crawl, the method comprising: 

determining whether to access the document during the current crawl with the aid of a 
probabilistic model that is based on the probability that the document has changed since the 
previous crawl; and 

accessing the document if the determination produces an instruction indicative that the 
document at the document address specification should be accessed during the current crawl. 

2. (Previously presented) The method of Claim 1, wherein determining whether to 
access the document with the aid of a probabilistic model comprises computing a probability that 
the document has changed since the document was retrieved during the previous crawl. 

3. (Previously presented) The method of Claim 2, wherein computing the 
probability that a document has changed comprises: 

selecting an active probability indicative of a proportion of documents in a plurality of 
documents that are changing at various change rates, the plurality of documents including the 
document; 

training the active probability to reflect experience with the document during a plurality 
of previous crawls; and 

using the trained active probability to compute the probability that the document has 
changed. 

4. (Original) The method of Claim 3, further comprising: 

selecting the probability that the document has changed from the previous crawl as the 
active probability in the current crawl; and 

repeating the method of Claim 3 for the current crawl. 

5. (Previously presented) The method of Claim 3, wherein training the active 
probability includes multiplying the active probability indicative of a change in the document by 
a training probability calculated using a probabilistic model. 
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6. (Previously presented) The method of Claim 1, wherein the probabilistic model 
further comprises: 

training a document probability distribution corresponding to the document address 
specification to reflect experience with the document during a plurality of previous crawls, the 
document probability distribution including a plurality of probabilities; 

determining from the document probability distribution a probability that the document 
has changed; and 

making a determination of whether to access the document in a current crawl based on 
the probability that the document has changed. 

7. (Previously presented) The method of Claim 6, further comprising: 
calculating, based on the experience with the document during a plurality of previous 

crawls, a discrete random variable distribution that includes a plurality of training probabilities; 
and 

multiplying each probability in the document probability distribution by a corresponding 
training probability from the discrete random variable distribution. 

8. (Original) The method of Claim 7, wherein the training probabilities are 
calculated using a Poisson process, the Poisson process including a Poisson equation (e A (-r*dt)) 
and a complementary Poisson equation (l-e A (-r*dt)). 

9. (Original) The method of Claim 8, wherein the experience with the document 
during the plurality of previous crawls is derived from historical information associated with the 
document address specification. 

10. (Currently amended) A computer-readable medium having computer-executable 
instructions for retrieving one document in a plurality of documents from a remote server, which 
when executed comprise: 

maintaining historical information associated with changes to the one document; 
initiating a crawl procedure for retrieving particular documents in the plurality of 
documents; and 

determining whether to access the one document from the remote server based on a 
probabilistic analysis of the historical information associated with the changes to the one 
document , said probabilistic analysis of the historical information being based on the probability 
that the one document has changed since a previous crawl . 
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1 1 . (Original) The computer-readable medium of Claim 10, further comprising: 

if the determination to access the one document is positive, identifying the one document 
for retrieval during the crawl procedure; and 

attempting to retrieve all documents identified for retrieval during the crawl procedure. 

12. (Previously presented) The computer-readable medium of Claim 10, wherein the 
probabilistic analysis comprises: 

computing a probability that the one document has changed since the one document was 
last retrieved from the remote server. 

13. (Original) The computer-readable medium of Claim 12, wherein computing the 
probability that the one document has changed further comprises: 

beginning with a probability that a pre-defined proportion of documents in the plurality 
of documents has changed, training the probability that the pre-defined proportion of documents 
has changed using the historical information associated with the one document to achieve the 
probability that the one document has changed. 

14. (Original) The computer-readable medium of Claim 12, further comprising 
making a random decision to retrieve the one document wherein the random decision is biased 
by the probability that the one document has changed. 

15. (Original) The computer-readable medium of Claim 14, wherein the random 
decision is further biased by a synchronization level configured to influence the random decision 
based on a predetermined degree of tolerance for not retrieving the one document if the 
document is likely to have changed. 

16. (Original) The computer-readable medium of Claim 14, wherein the random 
decision is made by a software routine adapted to simulate a flip of a coin. 

17. (Previously presented) The computer-readable medium of Claim 10, wherein: 
the historical information associated with changes to the one document includes a time 

stamp for the one document, the time stamp being indicative of the time that the one document 
was last modified when the one document was last retrieved from the remote server; and 

the probabilistic analysis includes a comparison of the time stamp included in the 
historical information with another time stamp associated with the one document stored on the 
remote server. 
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18. (Original) The computer-readable medium of Claim 17, further comprising: 

if the time stamp included in the historical information does not match the other time 
stamp associated with the one document stored on the remote server, identifying the one 
document for retrieval during the crawl procedure. 

19. (Previously presented) The computer-readable medium of Claim 10 ? wherein: 
the historical information associated with changes to the one document includes a hash 

value associated with the one document, the hash value being a representation of the one 
document; and 

the probabilistic analysis includes a comparison of the hash value included in the 
historical information with another hash value calculated from information retrieved from the 
one document stored on the remote server. 

20. (Original) The computer-readable medium of Claim 19, if the hash value 
included in the historical information does not match the other hash value associated with the 
one document stored on the remote server, identifying the one document for retrieval during the 
crawl procedure. 
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